Acoustic signature matching of audio content

ABSTRACT

Various embodiments relating to identifying an acoustic signature of an audio content item are provided. In one embodiment, an audio subsample of a test audio content item may be compared with corresponding audio subsamples of each of a plurality of catalog audio content items. If the audio subsample of the test audio content item matches the corresponding audio samples of two or more catalog audio content items, those catalog audio content items may be selected as candidate audio content items. A complete audio sample of the test audio content item may be compared to corresponding complete audio samples of each of the candidate audio content items. One of the candidate audio content items may be selected as a matching audio content item.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 61/655,406, filed Jun. 4, 2012 and entitled MULTI-SCREEN MEDIA DELIVERY, the entirety of which is hereby incorporated herein by reference for all purposes.

BACKGROUND

Various applications may employ acoustic audio identification to identify audio content, such as to provide identifying information for an unknown song. Existing approaches for providing acoustic audio identification (a.k.a., acoustic fingerprinting) typically analyze a small portion (e.g., 15 seconds) of an audio file, because analyzing the entire audio file can be an expensive process, both in regard to processing resources and other system costs. However, because these approaches only analyze a small portion of an audio file, in cases where different audio files have only very minor differences, such audio files cannot be easily differentiated and identified. For example, such acoustic fingerprinting approaches may be incapable of differentiating between an explicit version and a censored version of the same song, as the analyzed portion of the songs may be the same, but other portions of the song may be different.

SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

Various embodiments relating to identifying an acoustic signature of an audio content item are provided. In one embodiment, an audio subsample of a test audio content item may be compared with corresponding audio subsamples of each of a plurality of catalog audio content items. If the audio subsample of the test audio content item matches the corresponding audio subsamples of two or more catalog audio content items, those catalog audio content items may be selected as candidate audio content items. A complete audio sample of the test audio content item may be compared to corresponding complete audio samples of each of the candidate audio content items. One of the candidate audio content items may be selected as a matching audio content item.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a computing system according to an embodiment of the present disclosure.

FIGS. 2-3 show a method of identifying an acoustic signature of a test audio content item according to an embodiment of the present disclosure.

FIG. 4 shows a computing system according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

This description relates to identifying an acoustic signature of an audio content item (a.k.a., acoustic fingerprinting). More particularly, this description relates to an acoustic fingerprinting approach that includes a two-pass process to identify an audio content item by comparing the audio content item (referred to herein as the ‘test audio content item’) to a plurality of audio content items in a catalog (referred to herein as ‘catalog audio content items’). In particular, the first pass may include comparing an audio subsample (e.g., a 15 second clip) of a test audio content item with corresponding audio subsamples of each of a plurality of catalog audio content items. If the audio subsample of the test audio content item matches the corresponding audio subsamples of two or more catalog audio content items, those catalog audio content items may be selected as candidate audio content items. The second pass may include comparing a complete audio sample of the test audio content item to corresponding complete audio samples of each of the candidate audio content items. One of the candidate audio content items may be selected as a matching audio content item based on matching criteria.

By identifying candidates in the first step and comparing complete audio samples in the second step, accuracy of acoustic identification may be increased relative to an approach that merely compares audio subsamples. Moreover, processing performance may be increased relative to an approach that compares a complete audio sample of a test audio content item to complete audio samples of all catalog audio content items.

FIG. 1 shows a computing system 100 in accordance with an embodiment of the present disclosure. The computing system 100 includes a plurality of client computing machines (represented by a client computing machine 102 (referred to herein as the ‘client’). The plurality of client computing machines may be in communication with an audio identification service computing machine 110 (referred to herein as the ‘audio identification service’) over a network 108, such as the Internet. In particular, the clients may send requests to the audio identification service to acoustically fingerprint or identify different audio content items. Further, other services may send requests to the audio identification service to acoustically fingerprint or identify different audio content items, such as a music management service or the like. Moreover, it is to be understood that the audio identification service may acoustically fingerprint or identify an audio content item without a request from another entity. In other words, the audio identification service may initiate acoustic identification of an audio content item.

It should be understood that virtually any number of different clients may be in communication with the audio identification service without departing from the scope of this disclosure. Non-limiting examples of clients may include desktop computers, laptop computers, smart phones, tablet computers, gaming consoles, set-top boxes, networked televisions, networked stereos, mobile devices, and any other suitable computing machine.

The client 102 includes a library 104 of audio content items. The library of audio content items may include any suitable type of audio content item including any suitable audio file or audio recording, such as a song or other music, audio book or other spoken word, sound effect, movie with audio component, etc. The library of audio content items may include any suitable number of audio content items. For example, the library may include a collection of songs that a user has purchased via an online marketplace, uploaded from compact discs or other media, or otherwise acquired. In some embodiments, the library may be at least partially stored locally at the client. In some embodiments, the library may be stored remotely from the client, and may be accessed by the client (e.g., the library may include pointers that point to remote storage locations of corresponding audio content items). Although the fingerprinting and identification concepts are described in the context of sound and acoustic signatures, it is to be understood that these concepts are broadly applicable to visual or video fingerprinting or identification. In some embodiments, the identification service may be configured to visually fingerprint or identify video content or other imagery additionally or instead of fingerprinting or identifying audio content.

The library 104 may include a test audio content item 106 that may be acoustically fingerprinted or identified by the audio identification service. The test audio content item may be representative of any audio content item in the library. It is to be understood that the test audio content item may be acoustically fingerprinted or identified for any suitable reason or as part of any suitable operation without departing from the scope of the present disclosure. For example, the test audio content item may be acoustically identified as part of a copyright compliance, licensing, or other appropriate scheme.

The audio identification service 110 may be configured to acoustically fingerprint or identify an audio signature of a test audio content item from a client. More particularly, the audio identification service may be configured to perform a two-pass identification process that increases the accuracy of acoustic identification while maintaining good processing performance.

In some embodiments, the audio identification service may receive at least some portion of the test audio content item (e.g., song bits or data) from the client, such as a subsample or complete sample of the test audio content item, and the audio identification service may generate the acoustic fingerprint to perform the identification analysis. In some embodiments, acoustic fingerprints of the subsample and/or the complete sample of the test audio content item may be generated locally at the client, and the acoustic fingerprints may be sent to the audio identification service to perform the identification analysis. In some embodiments, the audio identification service may be integrated with the client and the acoustic fingerprinting and identification analysis may be performed locally at the client.

The audio identification service 110 may include a catalog 112 including a plurality of catalog audio content items 114. The catalog may include any suitable type of audio content item including any suitable audio file or audio recording, such as a song or other music, audio book or other spoken word, sound effect, etc. For example, the catalog may include different versions of the same song that may have minimal differences (e.g., an explicit version and a censored version), and may be treated as different audio content items. As another example, the catalog may include different versions of the same song that are acoustically identical, but have other differences, such as different metadata, licensing, etc. These acoustically identical songs may be treated as different audio content items.

The catalog of audio content items may include any suitable number of audio content items. Generally, the catalog may include an entire collection of audio content items, whereas a client's library may include a subset of the collection of audio content items. However, in some embodiments, the catalog and a client's library may have the same collection of audio content items. Further, in some embodiments, a client's library may include one or more audio content items that are not included in the catalog, or that may be added to the catalog upon performing an acoustic identification as a test audio content item.

The audio identification service may be configured to identify an acoustic signature of a test audio content item. For example, the audio identification service may be configured to receive an identification of the test audio content item from a client. In one example, the identification may include metadata associated with the test audio content item. For example, the metadata may include an artist, an album, a song title, a duration, a file name, a folder name, a track number, a release year, or any other suitable information to identify the audio content item. In another example, the audio identification service may be configured to receive a hash file of the test audio content item. The hash file may be used to determine if the test audio content item has been previously acoustically identified by the audio identification service. If the hash file identifies a catalog audio content item, the audio identification service may be configured to select that catalog audio content item as a matching audio content item that matches an acoustic signature of the test audio content item. If the hash file identifies a catalog audio content item as a matching audio content item, the identification process does not have to be performed, because it has been performed previously for the test audio content item. If the hash file does not identify a catalog audio content item, the audio identification service may continue with the identification process. It is to be understood that the hash file comparison does not involve a comparison of acoustic signatures or fingerprints. For example, the hash file may include a relationship between the identification of the test audio content item and a matching catalogue audio content item. This relationship may be used to look up a matching audio content item.

The audio identification service may be configured to compare an audio subsample of the test audio content item with corresponding audio subsamples of each of the plurality of catalog audio content items in the catalog. For example, the audio subsample and the corresponding audio subsamples may have a same duration and a same temporal position or offset. In one particular example, an audio subsample may have a duration of 15 seconds, and may be offset 1 minute from the beginning of a track. In some embodiments, the audio subsample may be predefined. In some embodiments, the audio subsample may be received from the client. In some embodiments, the test audio content item may be acoustically analyzed to identify an audio subsample that may be individualized or unique in order to reduce a possibility of misidentifying the test audio content item.

It is to be understood that the audio subsamples may be compared according to any suitable acoustic fingerprinting or identification technology without departing from this description. For example, an acoustic fingerprinting comparison and identification algorithm may take into account perceptual audio characteristics of an audio content item. In other words, when comparing acoustic fingerprints of two audio content items, if two audio content items sound alike to the human ear, their acoustic fingerprints should match, even if their bitwise representations are quite different. For example, a comparison of acoustic fingerprints may contemplate perceptual characteristics such as average zero crossing rate, estimated tempo, average spectrum, spectral flatness, prominent tones across a set of bands, and bandwidth. Furthermore, differences in frequency, amplitude, and/or other parameters may be considered by a comparison of acoustic fingerprints.

In some embodiments, two audio content items may be determined to match if one or more of the above characteristics of each of the two audio content items are within a corresponding threshold value of each other. In other words, two acoustic signatures may be determined to match if relative changes of given characteristics are reasonably the same even if absolute values are slightly different. In some embodiments, two acoustic signatures may be determined to match if any suitable portions of the two samples match within a threshold value. For example, if an acoustic signature of a sample temporally positioned at 11-15 seconds of a test song matches of an acoustic signature of a sample temporally positioned at 9-13 seconds of a catalog song, then the two songs may be determined to be matching. Such determinations may allow for small changes in data, such as timing shifts or changes in other characteristics.

In some embodiments, a parameter of the test audio content item may be used to reduce a number of catalog audio content items involved in the comparison of the audio subsample of the test audio content item with audio subsamples of the catalog audio content items. For example, the number of catalog audio content items may be reduced based on a duration (e.g., the duration of the complete sample) of the test audio content item. In particular, catalogue audio content items that have a duration that is different from the duration of the test audio content item by more than a threshold value may be omitted from the comparison of the audio subsamples. It is to be understood that any suitable parameter may be used to narrow down the number of catalog audio content items involved in the first pass comparison of audio subsamples. Accordingly, the first pass comparison may be performed more quickly than a comparison that involves all catalogue audio content items.

In some embodiments, if the audio subsample of the test audio content item does not match any of the corresponding audio subsamples of the catalog audio content items, the audio identification service may be configured to report that the test audio content item does not match any of the catalog audio content items. In some embodiments, the audio identification service may be configured to convert the test audio content item to a catalog audio content item. In other words, the test audio content item may be added to the catalog. In some embodiments, if the audio subsample of the test audio content item does not match any of the corresponding audio subsamples of the catalog audio content items, another audio subsample of the test audio content item may be compared to corresponding audio subsamples of the catalog audio content items. For example, the audio subsample may have a different duration (e.g., the duration may be increased) or a different temporal position (e.g., the audio subsample may start 2 minutes from a beginning of the track).

If the audio subsample of the test audio content item matches the corresponding audio subsample of only one catalog audio content item, the audio identification service may be configured to select that catalog audio content item as a matching audio content item that matches an acoustic signature of the test audio content item.

If the audio subsample of the test audio content item matches the corresponding audio samples of two or more catalog audio content items, the audio identification service may be configured to select those catalog audio content items as candidate audio content items. The audio identification service may be configured to compare a complete audio sample of the test audio content item to corresponding complete audio samples of each of the candidate audio content items. For example, the complete audio sample may be an entire audio duration of the test audio content item.

If the complete audio sample of the test audio content item does not match corresponding audio samples of any of the candidate audio content items, the audio identification service may be configured to report that the test audio content item does not match any of the catalog audio content items. In some embodiments, the audio identification service may be configured to convert the test audio content item to a catalog audio content item. In other words, the test audio content item may be added to the catalog.

If the complete audio sample of the test audio content item matches the second corresponding complete audio sample of only one candidate audio content item, the audio identification service may be configured to select that candidate audio content item as the matching audio content item that matches an acoustic signature of the test audio content item.

If the complete audio sample of the test audio content item matches corresponding complete audio samples of two or more candidate audio content items, the audio identification service may be configured to select one of the two or more candidate audio content items that has metadata that most closely matches corresponding metadata of the test audio content item as the matching audio content item. It is to be understood that any desired metadata may be used as the secondary matching criteria to select the matching audio content item. For example, if two or more candidate audio content items are acoustically identical, the version that has a license to be played in a region associated with the client may be selected as the matching audio content item. In some embodiments, information other than metadata may be used as secondary matching criteria to select a candidate as a matching audio content item.

In some embodiments, the audio identification service may be configured to select one of the candidate audio content items as a matching audio content item based on matching criteria. For example, the matching criteria may include the above described cases. In another example, if none of the candidates match the test audio content item, then a next closest item may be selected as a matching audio content item. For example, if the test audio content item is an explicit version of a song is not included in the catalog, then a censored version of the same song that is included in the catalog may be provided as a matching song.

In some embodiments, the audio identification service may be stateless. For example, instead of keeping temporary results of a comparison of audio subsamples on a server side, the results may be returned to the client with the first pass results, and sent back to the server with a second pass request. Moreover, if the audio identification service identifies candidate audio content items, the candidates may be reported to the client. In some embodiments, information from a given identification session or routine may be stored in a cache shared across instances at the audio identification service. For example, when a service instance replies to a first-pass with a response requesting a second-pass, a requestID and intermediate results (e.g., candidates provided from the audio subsample comparison) may be stored in the shared cache (e.g., with expiration after a few minutes). Subsequently, when another service instance receives the corresponding second-pass request, the service instance may look up the requestID in the shared cache and retrieve the intermediate results in order to process the second pass.

FIGS. 2-3 show a method 200 of identifying an acoustic signature of a test audio content item according to an embodiment of the present disclosure. For example, the method 200 may be performed by the audio identification service computing machine 110 shown in FIG. 1.

At 202, the method 200 may include receiving a hash file of the test audio content item.

At 204, the method 200 may include determining whether the hash file identifies a catalog audio content item of a plurality of catalog audio content items. If the hash file identifies a catalog audio content item of the plurality of catalog audio content items, then the method 200 moves to 206. Otherwise, the method 200 moves to 208.

At 206, the method 200 may include selecting the catalog audio content item identified from the hash file as a matching audio content item, and returning to other operations.

At 208, the method 200 may include comparing an audio subsample of the test audio content item with corresponding audio subsamples of each of the plurality of catalog audio content items.

At 210, the method 200 may include determining how many audio subsamples of catalog audio content items match the audio subsample of the test audio content item. If the audio subsample of the test audio content item does not match corresponding audio subsamples of any of the catalog audio content items, then the method 200 moves to 222. If the audio subsample of the test audio content item matches the corresponding audio subsample of only one catalog audio content item, then the method 200 moves to 220. If the audio subsample of the test audio content item matches the corresponding audio samples of two or more catalog audio content items, then the method 200 moves to 212.

At 212, the method 200 may include selecting catalog audio content items that have matching audio subsamples as candidate audio content items.

At 214, the method 200 may include comparing a complete audio sample of the test audio content item to corresponding complete audio samples of each of the candidate audio content items.

At 216 of FIG. 3B, the method 200 may include determining how many complete audio samples of catalog audio content items match the complete audio sample of the test audio content item. If the complete audio sample of the test audio content item does not match corresponding complete audio samples of any of the catalog audio content items, then the method 200 moves to 222. If the complete audio sample of the test audio content item matches the corresponding complete audio sample of only one catalog audio content item, then the method 200 moves to 220. If the complete audio sample of the test audio content item matches the corresponding complete audio sample of two or more catalog audio content items, then the method 200 moves to 218.

At 218, the method 200 may include selecting one of the two or more candidate audio content items as the matching audio content item based on secondary matching criteria, and returning to other operations. For example, the secondary matching criteria may include selecting one of the candidate audio content items that has metadata that most closely matches corresponding metadata of the test audio content item as the matching audio content item. It is to be understood that any suitable secondary matching criteria may be employed to select a candidate audio content items as a matching audio content item.

At 220, the method 200 may include selecting a catalog audio content item that has been identified as having the only matching audio subsample or the only matching complete audio sample as the matching audio content item, and returning to other operations.

At 222, the method 200 may include reporting that the test audio content item does not match any of the catalog audio content items.

At 224, the method 200 may include converting the test audio content item to a catalog audio content item. In other words, the test audio content item may be added to the catalog as a catalog audio content item.

By identifying candidates in the first step and comparing complete audio samples in the second step, accuracy of acoustic identification may be increased relative to an approach that merely compares audio subsamples. Moreover, processing performance may be increased relative to an approach that compares a complete audio sample of a test audio content item to complete audio samples of all catalog audio content items.

In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.

FIG. 4 schematically shows a non-limiting embodiment of a computing system 400 that can enact one or more of the methods and processes described above. For example, computing system 400 may be representative of the client computing machine 102 or the audio identification service computing machine 110 shown in FIG. 1. Computing system 400 is shown in simplified form. Computing system 400 may take the form of one or more personal computers, server computers, tablet computers, home-entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices.

Computing system 400 includes a logic machine 402 and a storage machine 404. Computing system 400 may optionally include a display subsystem 406, input subsystem 408, communication subsystem 410, and/or other components not shown in FIG. 4.

Logic machine 402 includes one or more physical devices configured to execute instructions. For example, the logic machine may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.

The logic machine may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic machine may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic machine may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic machine optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic machine may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.

Storage machine 404 includes one or more physical devices configured to hold instructions executable by the logic machine to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage machine 404 may be transformed—e.g., to hold different data.

Storage machine 404 may include removable and/or built-in devices. Storage machine 404 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. Storage machine 404 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.

It will be appreciated that storage machine 404 includes one or more physical devices. However, aspects of the instructions described herein alternatively may be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for a finite duration.

Aspects of logic machine 402 and storage machine 404 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.

It will be appreciated that a “service”, as used herein, is an application program executable across multiple user sessions. A service may be available to one or more system components, programs, and/or other services. In some implementations, a service may run on one or more server-computing devices.

When included, display subsystem 406 may be used to present a visual representation of data held by storage machine 404. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the storage machine, and thus transform the state of the storage machine, the state of display subsystem 406 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 406 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic machine 402 and/or storage machine 404 in a shared enclosure, or such display devices may be peripheral display devices.

When included, input subsystem 408 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.

When included, communication subsystem 410 may be configured to communicatively couple computing system 400 with one or more other computing devices. Communication subsystem 410 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 400 to send and/or receive messages to and/or from other devices via a network such as the Internet.

It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted Likewise, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof. 

1. A method of identifying an acoustic signature of a test audio content item, the method comprising: comparing an audio subsample of the test audio content item with corresponding audio subsamples of each of a plurality of catalog audio content items; if the audio subsample of the test audio content item matches the corresponding audio subsamples of two or more catalog audio content items, selecting those catalog audio content items as candidate audio content items; comparing a complete audio sample of the test audio content item to corresponding complete audio samples of each of the candidate audio content items; and selecting one of the candidate audio content items as a matching audio content item.
 2. The method of claim 1, wherein, if the complete audio sample of the test audio content item matches the corresponding complete audio sample of only one of the candidate audio content items, then that candidate audio content item is selected as the matching audio content item.
 3. The method of claim 1, wherein, if the complete audio sample of the test audio content item matches corresponding complete audio samples of two or more candidate audio content items, then one of the two or more candidate audio content items is selected as the matching audio content item based on secondary matching criteria.
 4. The method of claim 3, wherein the secondary matching criteria includes metadata.
 5. The method of claim 1, further comprising: if the audio subsample of the test audio content item matches the corresponding audio subsample of only one catalog audio content item, selecting that catalog audio content item as the matching audio content item.
 6. The method of claim 1, further comprising: if the complete audio sample of the test audio content item does not match corresponding audio samples of any of the candidate audio content items, reporting that the test audio content item does not match any of the catalog audio content items.
 7. The method of claim 6, further comprising: converting the test audio content item to a catalog audio content item.
 8. The method of claim 1, wherein the audio subsample of the test audio content item and the corresponding audio subsamples of the catalog audio content items have a same duration and a same temporal position.
 9. The method of claim 1, wherein the complete audio sample is an entire audio duration of the test audio content item.
 10. A method of identifying an acoustic signature of a test audio content item, the method comprising: comparing an audio subsample of the test audio content item with corresponding audio subsamples of each of a plurality of catalog audio content items; if the audio subsample of the test audio content item matches the corresponding audio subsample of only one catalog audio content item, selecting that catalog audio content item as a matching audio content item; if the audio subsample of the test audio content item matches the corresponding audio samples of two or more catalog audio content items, selecting those catalog audio content items as candidate audio content items; comparing a complete audio sample of the test audio content item to corresponding complete audio samples of each of the candidate audio content items; and if the complete audio sample of the test audio content item matches the second corresponding complete audio sample of only one candidate audio content item, selecting that candidate audio content item as the matching audio content item.
 11. The method of claim 10, further comprising: if the complete audio sample of the test audio content item matches corresponding complete audio samples of two or more candidate audio content items, selecting one of the two or more candidate audio content items as the matching audio content item based on secondary matching criteria.
 12. The method of claim 11, wherein the secondary matching criteria includes metadata.
 13. The method of claim 10, further comprising: if the audio subsample of the test audio content item does not match corresponding audio sub samples of any of the catalog audio content items, reporting that the test audio content item does not match any of the catalog audio content items and converting the test audio content item to a catalog content item.
 14. The method of claim 10, further comprising: if the complete audio sample of the test audio content item does not match corresponding complete audio samples of any of the candidate audio content items, reporting that the test audio content item does not match any of the catalog audio content items; and converting the test audio content item to a catalog content item.
 15. The method of claim 10, wherein the audio subsample of the test audio content item and the corresponding audio subsamples of the catalog audio content items have a same duration and a same temporal position.
 16. The method of claim 10, wherein the complete audio sample is an entire audio duration of the test audio content item.
 17. A method of identifying an acoustic signature of a test audio content item, the method comprising: receiving a hash file of the test audio content item; if the hash file identifies a catalog audio content item of a plurality of catalog audio content items, selecting that catalog audio content item as a matching audio content item; if the hash file does not identify a catalog audio content item, comparing an audio subsample of the test audio content item with corresponding audio subsamples of each of the plurality of catalog audio content items; if the audio subsample of the test audio content item matches the corresponding audio subsample of only one catalog audio content item, selecting that catalog audio content item as a matching audio content item; if the audio subsample of the test audio content item matches the corresponding audio samples of two or more catalog audio content items, selecting those catalog audio content items as candidate audio content items; comparing a complete audio sample of the test audio content item to corresponding complete audio samples of each of the candidate audio content items; and if the complete audio sample of the test audio content item matches the second corresponding complete audio sample of only one candidate audio content item, selecting that candidate audio content item as the matching audio content item.
 18. The method of claim 17, further comprising: if the complete audio sample of the test audio content item matches corresponding complete audio samples of two or more candidate audio content items, selecting one of the two or more candidate audio content items that has metadata that most closely matches corresponding metadata of the test audio content item as the matching audio content item.
 19. The method of claim 17, wherein the audio subsample of the test audio content item and the corresponding audio subsamples of the catalog audio content items have a same duration and a same temporal position.
 20. The method of claim 17, wherein the complete audio sample is an entire audio duration of the test audio content item. 